Nature Biotechnology — Latest Matching Preprints

1

Carrierwave: A granular, incentive-aligned infrastructure for scientific communication

Bachelet, I.

2026-03-03 scientific communication and education 10.64898/2026.03.01.708795 medRxiv

Top 0.1%

44.2%

Show abstract

The peer-reviewed journal article imposes structural constraints on the dissemination, validation, and reuse of research outputs. Intermediate results, negative findings, methodological refinements, and replication attempts are systematically underrepresented in published literature, limiting visibility into ongoing research activity for both scientists and mission-driven funders. Here we present Carrierwave, an open infrastructure for continuous, granular scientific communication built on structured research objects (ROs), cryptographic provenance, blockchain-based attribution, and programmable incentive mechanisms. Each RO represents an atomic unit of scientific output -- a single experimental result, negative finding, dataset, protocol, or replication -- that is hashed for content integrity, stored in a persistent database, and optionally minted as an ERC-721 non-fungible token on the Ethereum blockchain. The system includes an on-chain bounty pool enabling funders to directly incentivize specific research activities, and an automated analysis layer that synthesizes disclosed ROs into continuously updated research landscape maps. We describe the system architecture, report on its implementation and deployment on Ethereum mainnet, and present a quantitative analysis of disease-specific publication frequency demonstrating the information latency problem that Carrierwave addresses. The distribution of publication frequency across disease areas is highly skewed, with the majority of conditions represented by fewer than four publications per year in high-impact biology journals. For diseases in the long tail, the interval between successive publications may span months or years. Publication frequency correlates poorly with disease burden, instead reflecting historical research community size and advocacy momentum. By reducing the unit of communication to the individual research object and eliminating editorial gatekeeping as a prerequisite for disclosure, Carrierwave increases the effective sampling rate of scientific activity in precisely the domains where publication-based visibility is most sparse. The system is live at https://carrierwave.org.

2

DAMPA - accelerated and simplified design of probe panels for targeted metagenomics using pangenome graphs

Payne, M.; Tam, K. K.-G.; Rockett, R. J.; Basile, K.; Bowden, R.; Sintchenko, V.; Kok, J.; Golubchik, T.

2026-05-22 infectious diseases 10.64898/2026.05.15.26352859 medRxiv

Top 0.1%

39.6%

Show abstract

Targeted metagenomics, where samples are enriched for multiple organisms of interest using oligonucleotide probes, is a highly efficient sequencing methodology that is becoming standard practice for genomics of viruses and complex polymicrobial samples. Efficient enrichment critically requires probes that capture both conserved and highly diverse genomic regions without loss of sensitivity, and with uniform representation in the sequencing pool. Design of optimal probesets poses a challenge: existing computational methods use k-mer hashing to reduce over-abundant sequences, but scalability and efficiency drop with increasing numbers of genomes, while diverse sequences remain under-represented. Here we show that incorporating evolutionary distance to compress probes via a graph-based representation of multiple genomes across species, together with k-mer hashing, reduces overrepresentation of conserved sequences, and yields more uniform coverage even of highly diverse loci. We make the method available in Dampa, an open-source tool that generates probesets in seconds on a standard laptop.

3

CROPseq-multi: a versatile solution for multiplexed perturbation and decoding in pooled CRISPR screens

Walton, R. T.; Qin, Y.; Blainey, P. C.

2024-03-17 genomics 10.1101/2024.03.17.585235 medRxiv

Top 0.1%

38.9%

Show abstract

Forward genetic screens seek to dissect complex biological systems by systematically perturbing genetic elements and observing the resulting phenotypes. While standard screening methodologies introduce individual perturbations, multiplexing perturbations improves the performance of single-target screens and enables combinatorial screens for the study of genetic interactions. Current tools for multiplexing perturbations are limited by technical challenges and do not offer compatibility across diverse screening methodologies, including enrichment, single-cell sequencing, and optical pooled screens. Here, we report the development of CROPseq-multi (CSM), a CROPseq1-inspired lentiviral system to multiplex Streptococcus pyogenes (Sp) Cas9-based perturbations with versatile readout compatibility and high performance for both perturbation and barcode identification. CSM has equivalent per-guide activity to CROPseq and low lentiviral recombination frequencies. Dual-guide CSM libraries are constructed in a single, facile molecular cloning step that facilitates the use of unique molecular identifiers. CSM is compatible with enrichment screening methodologies, single-cell RNA-sequencing readouts, and optical pooled screens. For optical pooled screens, an optimized and multiplexed in situ detection protocol improves barcode counts 10-fold (for mRNA detection), enables detection of recombination events, and reduces the number of sequencing cycles required for decoding by 3-fold relative to CROPseq. CROPseq-multi-v2 (CSMv2) adds compatibility for detection methods based on T7 RNA polymerase in vitro transcription2-5. CSM provides a single system for CRISPR screens that is compatible with individual and combinatorial perturbations, diverse SpCas9-based perturbation technologies, and multiple high-content, single-cell phenotypic readouts.

4

ACCIO: An Assembly-Based Tool Enabling Plasmid Detection

Raabe, N. J.; Griffith, M. P.; Rangachar Srinivasa, V.; Waggle, K. D.; Sundermann, A. J.; Pless, L.; Snyder, G. M.; Brooks, M. M.; Van Tyne, D.; Harrison, L. H.

2025-11-02 infectious diseases 10.1101/2025.10.30.25338662 medRxiv

Top 0.1%

35.8%

Show abstract

2.Plasmids are extrachromosomal mobile genetic elements that often carry genes responsible for antimicrobial resistance. Plasmid epidemiology aims to track the evolution and spread of plasmids, but the field currently faces significant barriers that make practical implementation using whole genome sequence data difficult. Hybrid-assembled genomes remain the most reliable way to identify and track complete plasmids; however, most genomic surveillance data exists in the form of short-read sequencing, which lacks the resolution required to accurately resolve plasmids. Despite recent advances, long-read-only assemblies have not yet reached the consistency seen in hybrid assemblies. The ideal approach to plasmid epidemiology using whole genome sequence data would consider the limitations of sequencing technologies and the constraints of existing genomic surveillance infrastructure, in addition to the unique evolutionary biology of plasmids. Here, we present ACCIO (Assembly-based Circular Contig Identification for Outbreaks), a tool which creates a reference plasmid database and uses it to infer which plasmids, and genetically related plasmid groupings, are present in an input assembly (Illumina, Nanopore, or hybrid assembly). We validated ACCIO using an internal dataset of 303 plasmid-harboring bacterial clinical and surveillance isolates collected from a single acute tertiary care center. When highly related database plasmids were grouped together, ACCIO achieved 100% sensitivity and 92.1% positive predictive value (PPV) for detection of plasmid groups using hybrid assemblies, and comparably strong performance for Illumina (93.0% sensitivity, 86.6% PPV) and Nanopore (79.3% sensitivity, 91.4% PPV) assemblies. Evaluation on three external datasets yielded consistently high performance. Finally, when benchmarked against MOB-suite, a tool for reconstruction and typing of plasmids, ACCIO demonstrated superior performance across nearly all assembly types and plasmid grouping levels. By integrating database construction, clustering, and plasmid calling into a single workflow compatible with all major sequencing platforms, ACCIO is intended to help advance plasmid epidemiology beyond its current technological and infrastructural barriers. 3. Impact statementDetecting and tracking plasmids--the mobile genetic elements often responsible for spreading antimicrobial resistance in hospital settings--is challenging, particularly when relying on short-read sequencing data alone. Short-read genome assemblies, despite widespread use in surveillance of bacterial pathogens, inherently lack the resolution required for plasmid analyses. Current bioinformatic methods struggle to identify whole plasmids from short-read assemblies alone, and often, hybrid assembly using both short- and long-read data is required for the robust analyses that are essential for tracking plasmids. To address these challenges, we developed ACCIO, a bioinformatics tool which utilizes input genome assemblies (short-read, long-read, or hybrid assemblies) to assess the plasmid content of clinical bacterial isolates for epidemiologic purposes. We validated its use against the recovery of circular plasmid sequences from hybrid assembled genomes as a gold standard method for determining plasmid content. Using a curated local database of 430 plasmid sequences, ACCIO provided accurate inferences of plasmid content from short-read (Illumina), long-read (Oxford Nanopore Technologies), and hybrid assemblies (both), ultimately facilitating genomic surveillance of plasmids regardless of sequencing technology. This work represents a meaningful step forward in advancing plasmid surveillance beyond the technological and infrastructural barriers that limit its broader expansion into healthcare and other settings. 4. Data summaryShort- and long-read sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under multiple BioProjects, and corresponding hybrid genome assemblies are available in GenBank. Accession numbers for all BioProjects, BioSamples, and SRA datasets are provided in Supplementary Data S1. All supporting data, software code, and experimental/analysis protocols are provided within the article or in supplementary data files. External validation of ACCIO used three external datasets (Cho et al. 2023, BioProjects PRJNA475751 and PRJNA874473, DOI: 10.1038/s41598-024-70540-1; Lipworth et al. 2024, BioProject: PRJNA604975, DOI: 10.1038/s41467-024-45761-7; Khezri et al. 2021, European Nucleotide Archive (ENA): PRJEB45084, DOI: 10.3390/microorganisms9122560). List of External SoftwareO_LIMOB-suite (v3.1.9) - https://github.com/phac-nml/mob-suite C_LIO_LISkani (v0.2.2) - https://github.com/bluenote-1577/skani C_LIO_LIScipy (v1.16.1) - https://github.com/scipy/scipy C_LIO_LIPling (v2.0.0) - https://github.com/iqbal-lab-org/pling C_LIO_LIMUMmer / NUCmer (v4.0.1) - https://mummer4.github.io/ C_LIO_LIMash / Mash Screen (v2.3) - https://github.com/marbl/Mash C_LIO_LISPAdes (v3.15.5) - https://github.com/ablab/spades C_LIO_LIUnicycler (v0.5.1) - https://github.com/rrwick/Unicycler C_LIO_LIFlye (v2.9.5) - https://github.com/mikolmogorov/Flye C_LIO_LIQUAST (v5.2.0) - https://github.com/ablab/quast C_LIO_LIKraken2 (v2.1.3) - https://github.com/DerrickWood/kraken2 C_LIO_LICheckM (v0.4) - https://github.com/Ecogenomics/CheckM C_LIO_LIAlbacore/Guppy - [no longer officially hosted; was distributed by ONT] C_LIO_LIGuppy - https://nanoporetech.com/software/other/guppy C_LIO_LIDorado - https://github.com/nanoporetech/dorado C_LIO_LIBowtie2 (v2.5.4) - https://github.com/BenLangmead/bowtie2 C_LIO_LIMinimap2 (v2.28) - https://github.com/lh3/minimap2 C_LIO_LIBiopython (v1.85) - https://biopython.org/ C_LIO_LIPandas (v2.3.1) - https://pandas.pydata.org/ C_LIO_LIPlasme (v1.1) - https://github.com/HubertTang/PLASMe C_LIO_LIBLAST(v2.17.0) - https://blast.ncbi.nlm.nih.gov/Blast.cgi C_LI

5

Paired plus-minus sequencing is an ultra-high throughput and accurate method for dual strand sequencing of DNA molecules

Cheng, A. P.; Rusinek, I.; Sossin, A.; Widman, A. J.; Meiri, E.; Krieger, G.; Hirschberg, O.; Tov, D. S.; Gilad, S.; Jaimovich, A.; Barad, O.; Avaylon, S.; Rajagopalan, S.; Potenski, C.; Prieto, T.; Yuan, D. J.; Furatero, R.; Runnels, A.; Costa, B. M.; Shoag, J. E.; Al Assaad, M.; Sigouros, M.; Manohar, J.; King, A.; Wilkes, D.; Otilano, J.; Malbari, M. S.; Elemento, O.; Mosquera, J. M.; Altorki, N. K.; Saxena, A.; Callahan, M. K.; Robine, N.; Germer, S.; Evrony, G.; Faltas, B. M.; Landau, D. A.

2025-08-14 genomics Community evaluation 10.1101/2025.08.11.669689 medRxiv

Top 0.1%

34.7%

Show abstract

Distinguishing real biological variation in the form of single-nucleotide variants (SNVs) from errors is a major challenge for genome sequencing technologies. This is particularly true in settings where SNVs are at low frequency such as cancer detection through liquid biopsy, or human somatic mosaicism. State-of-the-art molecular denoising approaches for DNA sequencing rely on duplex sequencing, where both strands of a single DNA molecule are sequenced to discern true variants from errors arising from single stranded DNA damage. However, such duplex approaches typically require massive over-sequencing to overcome low capture rates of duplex molecules. To address these challenges, we introduce paired plus-minus sequencing (ppmSeq) technology, in which both DNA strands are partitioned and clonally amplified on sequencing beads through emulsion PCR. In this reaction, both strands of a double-stranded DNA molecule contribute to a single sequencing read, allowing for a duplex yield that scales linearly with sequencing coverage across a wide range of inputs (1.8-98 ng). We benchmarked ppmSeq against current duplex sequencing technologies, demonstrating superior duplex recovery with ppmSeq, with a rate of 44%{+/-}5.5% (compared to [~]5-11% for leading duplex technologies). Using both genomic as well as cell-free DNA, we established error rates for ppmSeq, which had residual SNV detection error rates as low as 7.98x10-8 for gDNA (using an end-repair protocol with dideoxy nucleotides) and 3.5x10-7{+/-}7.5x10-8 for cell-free DNA. To test the capabilities of ppmSeq for error-corrected whole-genome sequencing (WGS) for clinical application, we assessed circulating tumor DNA (ctDNA) detection for disease monitoring in cancer patients. We demonstrated that ppmSeq enables powerful tumor-informed ctDNA detection at concentrations of 10-4 across most cancers, parts per million sensitivity in cancers with high mutation burden, and further increased sensitivity with higher sequencing depth. We then leveraged genome-wide trinucleotide mutation patterns characteristic of urothelial (APOBEC3-related and platinum exposure-related signatures) and lung (tobacco-exposure-related signatures) cancers to perform tumor-naive ctDNA detection, showing that ppmSeq can identify a disease-specific signal in plasma cell-free DNA without a matched tumor, and that this signal correlates with imaging-based disease metrics. Altogether, ppmSeq provides an error-corrected, cost-efficient and scalable approach for high-fidelity WGS that can be harnessed for challenging clinical applications and emerging frontiers in human somatic genetics where high accuracy is required for mutation identification.

6

RAPID: Evaluation of Cas12a Protospacer Nicking and Chimeric Reporters for PAM-free RNA and DNA diagnostics

IWE, I.; Liu, F. X.; Corsano, A.; Da Silva, S. J. R.; Doucet, J.; Singh, s.; Lamothe, G.; Zayeni, R.; Nguyen, J.; Matthews, Q.; Vigar, J.; Bayat, P.; Simchi, M.; Bozovicar, K.; Charania, M.; Panfilov, S.; Li, X.; Mazzulli, T.; Tremblay, J. P.; Zhao, Y.; Green, A. A.; Li, Z.; Yao, S.; Pardee, K.

2025-07-14 infectious diseases 10.1101/2025.07.12.25331452 medRxiv

Top 0.1%

33.5%

Show abstract

CRISPR-Cas nucleases have revolutionized diagnostics and biotechnology by providing programmable specificity. Here, we extend the understanding of Cas12a biology with a screen that, unexpectedly, finds that Cas12a trans cleavage activity can be modulated by nicks in the protospacer in a position-dependent manner. Wanting to explore the impact of non-conventional trans cleavage substrates, we subsequently find that non-specific Cas12a cleavage can be significantly reduced with RNA and chimeric (mixed RNA/DNA) reporter sequences. Exploiting these features, we introduce RAPID (RNA/DNA Advanced chimeric, PAM-free, Integrated Nicking, Diagnostics), a PAM-independent nucleic acid detection platform. By strategically introducing a nick within the spacer region, RAPID expands Cas12a detection to include target RNAs, which can be ligated in situ to create a hybrid protospacer-target with trans cleavage activity matching conventional Cas12a. We then apply RAPID to detect single point mutations in ssDNA and RNA substrates, a challenge for traditional Cas12 and Cas13 systems. In combination with RT-LAMP, RAPID is used for PAM-free RNA detection in clinical samples, achieving sensitivity down to [~]1 aM and 100% concordance with RT-qPCR.

7

Enabling Megascale Microbiome Analysis with DartUniFrac

Zhao, J.; McDonald, D.; Sfiligoi, I.; Lladser, M. E.; Patel, L.; Weng, Y.; Khatib, L.; Degregori, S.; Gonzalez, A.; Lozupone, C.; Knight, R.

2026-03-03 bioinformatics 10.64898/2026.03.01.708916 medRxiv

Top 0.1%

33.0%

Show abstract

We introduce a new algorithm, DartUniFrac, and a near-optimal implementation with GPU acceleration, up to three orders of magnitude faster than the state of the art and scaling to millions of samples (pairwise) and billions of taxa. DartUniFrac connects UniFrac with weighted Jaccard similarity and exploits sketching algorithms for fast computation. We benchmark DartUniFrac against exact UniFrac implementations, demonstrating that DartUniFrac is statistically indistinguishable from them on real-world microbiome and metagenomic datasets.

8

Photolabile oligonucleotides combined with topologically imposed light gradients enable spatially resolved single-cell transcriptomics and epigenomics

Piscopio, R. A.; Chialastri, A.; Wang, C.; Godzik, M.; Heom, K. A.; Wang, W.; Wilson, M. Z.; Dey, S. S.

2025-10-02 genomics 10.1101/2025.10.01.679703 medRxiv

Top 0.1%

32.8%

Show abstract

The organization of cells within a tissue plays a critical role in tuning cellular function. Several methods have recently been developed to capture the transcriptome of cells while retaining spatial information. However, these genome-wide sequencing methods typically lack the spatial resolution of individual cells and are confined to quantifying positional information within predefined lattice locations, thereby failing to capture large sections of a tissue outside these regions. Further, these methods are generally limited to profiling fixed cells with reduced mRNA capture efficiency compared to standard scRNA-seq. In addition, existing methods lack modularity and cross-platform compatibility, thereby limiting most of these techniques from jointly profiling the epigenetic and transcriptomic state of individual cells. To overcome these limitations, we present scSTAMP-seq (single-cell Spatial Transcriptomic And Multiomic Profiling), an approach that employs cholesterol-tagged photolabile oligonucleotides that incorporate into cell membranes, enabling us to "stamp" the position of cells using spatially imposed light gradients prior to tissue dissociation and single-cell sequencing. Applied to live cells, scSTAMP-seq efficiently captures spatially resolved single-cell transcriptomes at high resolution for all cells within a field of view. Further, we demonstrate that light patterning enables dynamic spatial resolution, including the ability to map the position of individual cells. Finally, we show that scSTAMP-seq is modular and can be seamlessly integrated with various downstream single-cell sequencing technologies. We demonstrate this by performing scRNA-seq using plate- and droplet-based methods, and by performing joint epigenome and transcriptome sequencing from the same cell while preserving positional information. Collectively, these results demonstrate that scSTAMP-seq is a sensitive and high-throughput technology for mapping single-cell transcriptomes and epigenomes at the spatial resolution of individual cells.

9

Extensive length and homology dependent chimerism in pool-packaged AAV libraries

Lalanne, J.-B.; Mich, J. K.; Huynh, C.; Hunker, A.; McDiarmid, T. A.; Levi, B. P.; Ting, J. T.; Shendure, J.

2025-01-15 genomics 10.1101/2025.01.14.632594 medRxiv

Top 0.1%

32.7%

Show abstract

Adeno-associated viruses (AAVs) have emerged as the foremost gene therapy delivery vehicles due to their versatility, durability, and safety profile. Here we demonstrate extensive chimerism, manifesting as pervasive barcode swapping, among complex AAV libraries that are packaged as a pool. The observed chimerism is length- and homology-dependent but capsid-independent, in some cases affecting the majority of packaged AAV genomes. These results have implications for the design and deployment of functional AAV libraries in both research and clinical settings.

10

SpaceBio Knowledge Hub: A LiteratOmics Platform for Microgravity and Space Biology Research

Silva, J. C. F.; Vieira, A.; Chue Donahey, M. S.; Silva, S. M. d. C.; Veloso, T.; Lopes, A.; Sexson, N.; Barker, R.; Porterfield, D. M.; Silva, C. A.; Dias, R.

2026-07-14 scientific communication and education 10.64898/2026.07.13.737239 medRxiv

Top 0.1%

31.4%

Show abstract

Space biology literature is growing exponentially. Existing infrastructure has not kept pace with organizing, synthesizing, and disseminating this knowledge. We present SpaceBio SpaceBio Knowledge Hub (www.spacebio.space), an integrated digital ecosystem that combines artificial intelligence, real-time data integration, and open-access infrastructure to advance research, education, and collaboration in microgravity, space biology and space exploration. The platform applies AI-driven approaches including natural language processing, machine learning, and automated content generation to construct a semantic atlas of the field. The atlas reveals the hierarchical thematic organization underlying microgravity-induced biological responses, space mission infrastructure, planetary science, and astrobiology. As part of this effort, SpaceBio is moving toward the construction of a LiteratOmics framework for microgravity, and space biology a systematic, AI-enabled approach to mining, integrating, and structuring the primary literature generated by omics-driven spaceflight research, treating the scientific literature itself as a navigable data layer alongside genomic, transcriptomic, and proteomic datasets. Built on a scalable, cloud-based architecture with a user-centered interface, SpaceBio supports literature exploration, data integration, and knowledge discovery for researchers, educators, students, industry partners, and citizen scientists. The platform also functions as a community-building ecosystem. It integrates hands-on research initiatives, AI-generated educational content, pilot data science projects, and social responsibility programs that broaden participation without compromising scientific rigor. AI-enabled digital environments can transform fragmented literature into a navigable knowledge landscape. SpaceBio accelerates research productivity, strengthens STEM education, and supports the global space life sciences community as human space exploration enters in the most ambitious era.

11

Direct In-Sample Sequencing of the 3' Transcriptome Expands the Capabilities of Optical Pooled Screens

Honigfort, D.; Belda-Ferre, P.; White, D.; Sundararajan, K.; Dawood, M.; LeVieux, J.; Moreno, J.; Qi, X.; Metcalfe, K.; Naranbat, D.; Altomare, A.; Thompson, C.; Perez, C.; Lajoie, B.; Kwon, H.; Bhadha, P.; Rammel, T.; Rabalais, J.; Kellinger, M.; Kruglyak, S.; Arslan, S.; Previte, M.

2025-10-13 genomics 10.1101/2025.10.11.681797 medRxiv

Top 0.1%

31.0%

Show abstract

We present a platform that directly sequences single guide RNAs and endogenous 3'UTRs in fixed cells while simultaneously measuring protein abundance and cellular morphology. We demonstrate platform capability by performing optical pooled screening of CRISPR-perturbed lung cancer cells. This approach unites direct in-sample RNA sequencing with complementary phenotypic readouts, enabling comprehensive, scalable, and functional genomics analyses within a single experiment.

12

Denoising sparse microbial signals from single-cell sequencing of mammalian host tissues

Ghaddar, B.; Blaser, M. J.; De, S.

2022-06-30 genomics 10.1101/2022.06.29.498176 medRxiv

Top 0.1%

30.8%

Show abstract

We developed SAHMI, a computational resource to identify truly present microbial nucleic acids and filter contaminants and spurious false-positive taxonomic assignments from standard transcriptomic sequencing of mammalian tissues. In benchmark studies, SAHMI correctly identifies known microbial infections present in diverse tissues. The application of SAHMI to single-cell and spatial genomic data enables co-detection of somatic cells and microorganisms and joint analysis of host-microbiome ecosystems.

13

Compound Delivery of eVLPs Enhances Prime Editing for Targeted Genome Engineering and High-Throughput Screening

Langley, J.; Baudrier, L.; Curry, J.; Narta, K.; Todesco, H. M.; Potts, K.; Morrissy, S.; Mahoney, D. J.; Billon, P.

2025-08-11 genomics 10.1101/2025.08.11.669692 medRxiv

Top 0.1%

30.6%

Show abstract

Engineered virus-like particles (eVLPs) enable transgene-free ribonucleoprotein delivery for genome editing applications, yet optimized delivery strategies for high-throughput applications remain unexplored. Prime editing enables precise genomic modifications but suffers from limited efficiency that constrains its widespread adoption. Here, we present PRIME-VLP (Progressive Repeated Infections for Maximized Editing via Virus-Like Particles), a delivery strategy that enhances prime editing efficiency for both targeted genome engineering and high-throughput prime editing screening. PRIME-VLP leverages the temporal dynamics of eVLP-mediated editing through multiple sequential transductions with sub-saturating eVLP doses delivered at optimal intervals. This approach achieves 1.5 to 2.8-fold improvements in editing efficiency across diverse genomic targets and cell types. PRIME-VLP maintains high specificity without increasing off-target effects, compromising cellular viability or causing transcriptional perturbations. By decoupling pegRNA and editor delivery through pegRNA-free eVLPs, PRIME-VLP enables pooled prime editing screens, circumventing transgene silencing limitations of conventional lentiviral-based screens. Using a 6,000-pegRNA library targeting TP53, PRIME-VLP achieved 2.8-fold higher editing efficiency and improved reproducibility compared to conventional lentiviral delivery. An eVLP-based screen identified functional TP53 loss-of-function variants that confer resistance to MDM2 inhibition by Nutlin-3. This work expands the versatility of eVLPs beyond their current in vivo therapeutic applications, demonstrating their promise for high-throughput functional genomics by overcoming the delivery limitations of lentiviral systems.

14

Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform

Almogy, G.; Pratt, M.; Oberstrass, F.; Lee, L.; Mazur, D.; Beckett, N.; Barad, O.; Soifer, I.; Perelman, E.; Etzioni, Y.; Sosa, M.; Jung, A.; Clark, T.; Trepagnier, E.; Lithwick-Yanai, G.; Pollock, S.; Hornung, G.; Levy, M.; Coole, M.; Howd, T.; Shand, M.; Farjoun, Y.; Emery, J.; Hall, G.; Lee, S. K.; Sato, T.; Magner, R.; Low, S.; Bernier, A.; Gandi, B.; Stohlman, J.; Nolet, C.; Donovan, S.; Blumenstiel, B.; Cipicchio, M.; Dodge, S.; Banks, E.; Lennon, N.; Gabriel, S.; Lipson, D.

2022-08-10 genomics 10.1101/2022.05.29.493900 medRxiv

Top 0.1%

27.7%

Show abstract

We introduce a massively parallel novel sequencing platform that combines an open flow cell design on a circular wafer with a large surface area and mostly natural nucleotides that allow optical end-point detection without reversible terminators. This platform enables sequencing billions of reads with longer read length ([~]300bp) and fast runs times (<20hrs) with high base accuracy (Q30 > 85%), at a low cost of $1/Gb. We establish system performance by whole-genome sequencing of the Genome-In-A-Bottle reference samples HG001-7, demonstrating high accuracy for SNPs (99.6%) and Indels in homopolymers up to length 10 (96.4%) across the vast majority (>98%) of the defined high-confidence regions of these samples. We demonstrate scalability of the whole-genome sequencing workflow by sequencing an additional 224 selected samples from the 1000 Genomes project achieving high concordance with reference data.

15

Sub-cellular Imaging of the Entire Protein-Coding Human Transcriptome (18933-plex) on FFPE Tissue Using Spatial Molecular Imaging

Khafizov, R.; Piazza, E.; Cui, Y.; Patrick, M.; Metzger, E.; McGuire, D.; Dunaway, D.; Danaher, P.; Hoang, M. L.; Grootsky, A.; Vandenberg, M.; He, S.; Liu, R.; McKean, M.; Rhodes, M.; Beechem, J. M.

2024-12-03 genomics 10.1101/2024.11.27.625536 medRxiv

Top 0.1%

27.5%

Show abstract

Single-cell RNA-seq revolutionized single-cell biology, by providing a complete whole transcriptome view of individual cells. Regrettably, this was accomplished only for individual, tissue-dissociated cells. High-plex spatial biology has begun to recover the x, y, and z-coordinates of single-cells, but typically at the expense of far less than whole transcriptome coverage. To solve this problem, Bruker Spatial Biology has accomplished a commercial-grade panel (CosMx(R) Spatial Molecular Imager Whole Transcriptome Panel (WTx)), using 37,872 imaging barcodes, capable of sub-cellular imaging of the entire human protein-coding transcriptome. The imaging barcodes are encoded with 156 bits of information (4 on-cycles and 35 dark-cycles per code), at a Hamming Distance of 4 from each other to achieve a very low false-code detection. Key to achieving this high-plex capability was the ability to manufacture imaging barcodes that require no in-tissue amplification (every barcode is manufactured under GMP to contain exactly 30 fluorescent dyes) and uniform, size-exclusion purified, extremely small imaging barcodes ([~] 20 nm). A detailed study of six different human FFPE tissue types was performed (Colon, Pancreas, Hippocampus, Skin, Breast, Kidney), yielding over 5.4 billion transcripts from 2.7 million cells. We counted over 1,550 transcripts-per-cell on average and observed 900 unique genes per cell (measured as the median). Single fixed-cells containing well over 10,000 subcellularly imaged transcripts were accomplished. Advancing single-cell imaging to the whole transcriptome level opens a single unified approach to accomplish essentially all single-cell experiments, both imaging and non-imaging. Depending upon the sample type (e.g. fixed-cells, organoids, tissue sections, etc.), the transcripts per cell and genes per cell measured using the whole transcriptome panel often exceeds that obtained by the highest-resolution single-cell RNA-seq, can be performed on a single 5 {micro}m FFPE tissue section, with no dissociation bias (every cell is counted). Pathway analysis within the tumor bed of a colon adenocarcinoma sample found evidence of enrichment in pathways suggestive of an aggressive tumor type, and localized ligand-receptor analysis showed spatially restricted patterns related to adhesion, migration, and proliferation. The high-dimensional whole transcriptome data is streamed directly to a cloud-based Spatial Informatics Platform, allowing for the scalable processing of millions-of-single-cells and billions-of-transcripts per operation. The WTx data are combined with high-resolution antibody-based cell-morphology imaging and data-driven machine-learning cell segmentation algorithms, to generate the most complete view of single cell and sub-cellular spatial biology that has ever been obtained.

16

Genome-wide single-cell perturbation screens with VIPerturb-seq

Bradu, A.; Blair, J. D.; Grabski, I. N.; Mascio, I.; Lee, J.; McCormick, C.; Satija, R.

2026-02-14 genomics 10.64898/2026.02.12.705613 medRxiv

Top 0.1%

26.9%

Show abstract

CRISPR-based screening combined with single-cell sequencing (i.e. Perturb-seq) enables systematic mapping of genetic perturbations to molecular phenotypes. While Perturb-seq is well-suited to profile targeted subsets of regulators, scaling to genome-wide screens presents substantial cost and throughput challenges. Here we introduce VIPerturb-seq, a platform to facilitate routine genome-wide Perturb-seq experiments using probe-based detection workflows. We describe a split probe strategy for detection of genome-wide CRISPR libraries in fixed cells that enables (i) optional support for phenotypic enrichment of Very Important Perturbations (VIP) prior to single-cell profiling, and (ii) compatibility with combinatorial indexing workflows to further improve Perturb-seq throughput by 50-fold. Using a genome-wide CRISPRi library (GuEST-List), we demonstrate VIPerturb-seq on two genome-wide screens representing both unbiased and phenotypically enriched workflows. Our results demonstrate how the sensitivity, scalability, and efficiency of VIPerturb-seq can enable both individual labs with targeted research questions and large data generation platforms aiming to construct virtual cells.

17

Programmable kinetic barcoding for multiplexed RNA detection with Cas13a

Son, S.; Lyden, A.; Pitti, C. N.; Dextre, A.; Shu, J.; Stephens, S. I.; Fozouni, P.; Knott, G. J.; Smock, D. C.; Liu, T. Y.; Boehm, D.; Simoneau, C.; Kumar, R. G.; Doudna, J. A.; Ott, M.; Fletcher, D. A.

2025-07-06 infectious diseases 10.1101/2025.07.03.25328829 medRxiv

Top 0.1%

26.6%

Show abstract

Rapid identification of viral infections and specific variants in patient samples requires a simple and multiplexed RNA detection method that does not rely on DNA sequencing. Although recent direct detection assays based on CRISPR-Cas13a1-4 offer rapid RNA detection by avoiding reverse transcription and DNA amplification required of gold-standard PCR assays5, these assays are not easily multiplexed to detect multiple viruses or variants without dividing the sample into separate reactions. Here we show that Cas13a acting on single target RNAs exhibits variable nuclease activity that depends on the interaction between the target RNA and crRNA. To exploit this feature for multiplexed detection, we devised a crRNA modification strategy that enables programmable tuning of Cas13as nuclease enzymatic rates. Using a droplet-based Cas13a assay, we demonstrate that kinetic signatures can be harnessed to differentiate among respiratory viruses and SARS-CoV-2 variants in contrived and clinical samples. This kinetic barcoding strategy can be extended to additional RNA targets through simple modification of crRNAs.

18

CRISPRi with barcoded expression reporters dissects regulatory networks in human cells

Kim, J.; Muller, R. Y.; Bondra, E. R.; Ingolia, N.

2024-09-06 genomics 10.1101/2024.09.06.611573 medRxiv

Top 0.1%

26.6%

Show abstract

Genome-wide CRISPR screens have emerged as powerful tools for uncovering the genetic underpinnings of diverse biological processes. Incisive screens often depend on directly measuring molecular phenotypes, such as regulated gene expression changes, provoked by CRISPR-mediated genetic perturbations. Here, we provide quantitative measurements of transcriptional responses in human cells across genome-scale perturbation libraries by coupling CRISPR interference (CRISPRi) with barcoded expression reporter sequencing (CiBER-seq). To enable CiBER-seq in mammalian cells, we optimize the integration of highly complex, barcoded sgRNA libraries into a defined genomic context. CiBER-seq profiling of a nuclear factor kappa B (NF-{kappa}B) reporter delineates the canonical signaling cascade linking the transmembrane TNF-alpha receptor to inflammatory gene activation and highlights cell-type-specific factors in this response. Importantly, CiBER-seq relies solely on bulk RNA sequencing to capture the regulatory circuit driving this rapid transcriptional response. Our work demonstrates the accuracy of CiBER-seq and its potential for dissecting genetic networks in mammalian cells with superior time resolution.

19

CMS: Achieving Uniform and High-Quality Sequencing across Challenging Non-canonical Genomic Regions

Li, Q.; Liu, L.; Lin, Q.; Dan, X.; Jiang, Y.; Wei, Y.; Yang, M.; Peng, X.; Luo, W.; Wang, W.; Xu, D.; Huang, Z.; Sun, W.; Zhao, L.; Yan, Q.; Sun, L.; Feng, B.

2026-04-28 genomics 10.64898/2026.04.24.720553 medRxiv

Top 0.1%

26.5%

Show abstract

High-throughput sequencing is essential in modern biological research, yet low-complexity sequences remain challenging as they form structurally complex, non-canonical (non-B) DNA conformations that impede sequencing enzyme read-through. This leads to a long-standing trade-off: maximizing coverage introduces false positives (FP), while stringent filtering causes coverage loss and false negatives (FN). To address this, we developed CMS (Cross Mountains and Seas) on GeneMind sequencing platforms by optimizing its chemistry and enzymatic systems to traverse these secondary structures with high fidelity. Benchmarking across whole-genome (WGS) and whole-exome (WES) sequencing demonstrates that CMS addresses the trade-off by simultaneously enhancing both coverage uniformity and accuracy, notably achieving an approximately 100-fold reduction in low-coverage bins for WGS and a 70% reduction in FN insertions/deletions (INDELs) within complex non-B regions. Specifically, a synthetic G-quadruplex (G4) motif sequencing experiment demonstrates that CMS maintains a 1:1 strand ratio, effectively handling G4-induced biases where benchmarked platforms exhibit extensive depletion. These findings establish CMS as a reliable technology for the precise characterization of structural-challenging but functional-essential genome regions.

20

Multiplex genomic recording of enhancer and signal transduction activity in mammalian cells

Chen, W.; Choi, J.; Nathans, J. F.; Agarwal, V.; Martin, B.; Nichols, E.; Leith, A.; Lee, C.; Shendure, J.

2021-11-05 genomics 10.1101/2021.11.05.467434 medRxiv

Top 0.1%

26.2%

Show abstract

Measurements of gene expression and signal transduction activity are conventionally performed with methods that require either the destruction or live imaging of a biological sample within the timeframe of interest. Here we demonstrate an alternative paradigm, termed ENGRAM (ENhancer-driven Genomic Recording of transcriptional Activity in Multiplex), in which the activity and dynamics of multiple transcriptional reporters are stably recorded to DNA. ENGRAM is based on the prime editing-mediated insertion of signal- or enhancer-specific barcodes to a genomically encoded recording unit. We show how this strategy can be used to concurrently genomically record the relative activity of at least hundreds of enhancers with high fidelity, sensitivity and reproducibility. Leveraging synthetic enhancers that are responsive to specific signal transduction pathways, we further demonstrate time- and concentration-dependent genomic recording of Wnt, NF-{kappa}B, and Tet-On activity. Finally, by coupling ENGRAM to sequential genome editing, we show how serially occurring molecular events can potentially be ordered. Looking forward, we envision that multiplex, ENGRAM-based recording of the strength, duration and order of enhancer and signal transduction activities has broad potential for application in functional genomics, developmental biology and neuroscience.